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Abstract 

Observations of tropical convection from precipitation radar and the con¬ 
curring large-scale atmospheric state at two locations (Darwin and Kwajalein) 
are used to establish effective stochastic models to parameterise subgrid-scale 
tropical convective activity. Two approaches are presented which rely on the 
assumption that tropical convection induces a stationary equilibrium distri¬ 
bution. In the first approach we parameterise convection variables such as 
convective area fraction as an instantaneous random realisation conditioned on 
the large-scale vertical velocities according to a probability density function es¬ 
timated from the observations. In the second approach convection variables are 
generated in a Markov process conditioned on the large-scale vertical velocity, 
allowing for non-trivial temporal correlations. Despite the different prevalent 
atmospheric and oceanic regimes at the two locations, with Kwajalein being 
exposed to a purely oceanic weather regime and Darwin exhibiting land-sea in¬ 
teraction, we establish that the empirical measure for the convective variables 
conditioned on large-scale mid-level vertical velocities for the two locations are 
close. This allows us to train the stochastic models at one location and then 
generate time series of convective activity at the other location. The proposed 
stochastic subgrid-scale models adequately reproduce the statistics of the ob¬ 
served convective variables and we discuss how they may be used in future 
scale-independent mass-flux convection parameterisations. 
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1 Introduction 


Despite a remarkable increase in complexity and resolution of general circulation 
models (GCMs), the representation of deep convection, which ultimat ely serves to 


drive the general circulation, is still associated with large uncertainties (IFlato et al. 


20131 ). The inadequate representation of atmospheric convection i n GCMs is r e spon- 


sible for considerable uncertainty in estimating climate sensitivity fiBonv et all 12015 


and references therein) and ambiguities in the numerical simulation of the Earth’s cli¬ 
mate, for example when comparing the inter-model mean and spre ad of hydrological- 


cycle related variables of the CMIP5 ensembl e to observations (e.g. iJiang et all 12012 


Tian et al . 20131: Lauer and Hamilton . 2013). An improved representation of funda¬ 


mental atmospheric processes, such as convection, is therefore considered to be of 
utmost priority in the model design fjStevens and BonvI . l2013l: IJakobl. 1201411 . 

Atmospheric convection cannot be resolved by the model grid of GCMs currently 
used for climate projections and mu st therefo r e be p arar neterised. Mor e thaii four 
decades ago, the pioneering works of Oovama f 19641) and Manabe et al ( 1965 1 laid 
the foundations for the developmen t of increasingly complex conve ctive parameteri- 


sation schemes (see Arakawa ( 20041) for a review and Randall (2013) for an outlook). 


As a result of this development, GCMs are now capable of reliably capturing the 
overall amount of precipitation. Ho weve r , spa t ial distributions and variance often 


comp are poorly to observations (e.g. Dai , 20061: Pincus et al . 2008 : Stephens et al. 


2010fl . Further, capturing the statistical relationship between convective activity and 


the large-scal e environment i s a ch allenging task not often met by current GCMs. 
For example, Holloway et all ( 2012h show that a model with pararneterised convec¬ 
tion does not adequately reproduce the relationship between convective activity and 
vertical pressure velocity uj as found in a cloud-system resolving model with explicit 
convection. Using the observational datasets used in this study (cf. Section [2]), pre¬ 
liminary analysis of the relationship between rain rates and u a t 500 h Pa (cusoo) in 


a state-of-the art climate model (ECHAM6.2, c.f. IStevens et all I2013L for a model 
description) over Darwin and Kwajalein yield similar negative results, with the rela¬ 
tionship being qualitatively better captured over Kwajalein (not shown). 

Conventional convective parameterisations tend to be of a deterministic nature 
and represent only the mean effect of the small-scale unresolved convective processes 
on the resolved large-scale environment on the scale of the numerical grid. In these 
parameterisations, it is assumed that for any given resolved large-scale state of the 
atmosphere-ocean system there exists a single possible response at the small-scale con¬ 
vective state feeding back upon the large-scale state. There is, however, a mounting 


tionships between large-scale variables and convective scales (e.g. 

Peppier and Lamb 


1989; 

Sherwood. 

1999: 

Hollowav and Neelin. 2009: Stechmann and Neelinl. 2011: 

Davies et al. 

2013; 

Peters et al. 

20131. Furthermore cloud-resolving models (GRMsl reveal a high 


degree of variability of small-scale convective activity for a given large-scale state. 
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This challenges the usefulness of employin g determin i stic r e lationships between con¬ 


vective activity and larg e-scale variables fIXu et all Il992l: I Cohen and Craid. 12006 
Shutts and Palmer . 2007). 


The complex chaotic dynamics of small-scale p rocesses is widely recogn ised to 
give rise to the observed variability. For example, iHohenegger et ali (120061) . using 
an ensemble of limited-area convection permitting simulations over the European 
Alps, identihed gravity wayes generated in regions of diabatic forcing (i.e. moist 
convection) as the main source of error growth in their simulations. A lack of 
variability in the high-frequency, small-scale convective processes can dynamically 
propaga te upscale and cause GCMs to misrepresent low-frequ ency large-scale vari¬ 
ability f Ricciardulli and Garcial. 2000 : Horinouchi et al . 20031) . Model simulations 
and observations s uggest t hat a stoch astic approach to subgrid-scale parameteri- 
sations is needed flPalmerl . 1200 ll. 120121) . The recent increase of resolution of the 
numerical cores adds to the failure of purely deterministic parameterisations: For 
example, numerical square grids with edge lengths (!l(100km) and less do not con- 
tain sufficient cumulu s clou ds to allow for the estimation of meaningful averages 
fjPafmer and WilliamsI. 120081). arid there is a need for a stochas tic resolution aware 
parameterisation ( Arakawa et al . 2011 : Arakawa and Wu . 20131) . 


A pleth ora of stochas t ic sub grid-scale parameterisations for convection have been 
developed. iBuizza et al\ (119991) applied random perturbations to the parameterised 
tendencies in the operational EGMWF I ntegr ated Forecast System (IFS) improving 
its forecast skill. Lin and Neelinl ( 2000 . 2003h introduced random perturbations to 
convective available potential energy (GAPE) and to the heating prohle of the host 
convective scheme i mpro ving on the statistics of tropical intraseasonal variability. 
Bright and Mullen ( 20C)^ intr oduced random perturb ations to the trigger function 
of the Kain and Fritsch ( 1990 ) convection scheme, and Teixeira and RevnoldsI ( 20081) 
randomly perturbed tenden cies from a determinist ic convection scheme by sampling 
from a normal distribution. iPlant and Graigi (120081 ) used random samples of a distri¬ 
bution of convective plumes to match a required grid-box mean convective mass flux. 
Their scheme h as been successfully app l ied to a limited are a mod el-ensemble over 
central Europe ( Groenemeiier and Graig . 2012h . Berner et al. ( 2005 ) used ideas from 
cellular automata to introduce stochast ic forcing to the s tream function to model the 
effect of mesocale convective systems. iBengtsson et al\ (120131 ) developed a stochas¬ 
tic convective parameterisation based on cellular automata via a moisture conver¬ 
gence closure, and showed that in a limited area model-ensemble framework over 
Scandinavia, the parameterisation leads to a desired i ncrease in spread of the re¬ 
solv ed wind held in r e gions of enhanced deep convection. iMaida and Khouiderl (120021) 
and iKhouider et al\ (120031 ) drove a mass-flux co nvective parame t erisat ion with a 
stochastic model based on convective inhibition. IKhouider et all (120101) developed 
the stochastic multi-cloud model (SMGM) evolving a cloud population consisting 
of three cloud types associated with tropical convection (congestus, deep convective 
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and stratiform clouds) by means of a Markovian process conditioned on the atmo¬ 
spheric large-scale state. This model has been shown to adequately simulate tropical 
conv ection and as s ociated wave features in a s imple two-layer atmospheric model 

Deng et ai . 20151) and to reproduce observed convec- 


Frenkel et ai. 2012. 2013 


le.g. 

tive beha viour when observa tion-based transition time scales between cloud-types are 
adopted fjPeters et a/.l. 120131) . For a more comprehensive review on current sto c hastic 
sub grid-scale parameterisations of convection see, for example, iNeelin et ali (120081) 
and iPalmer and Williamsl (120101) . 

Despite successfully capturing the observed high-frequency variability stochas¬ 
tic subgrid-scale parameterisations are often difficult t o tune and very sensitive to 
the c hoice of the parameters as shown for example by iLin and Neeli 


^ sensit 

3 (l2000l 


2002 


20031) . There has, however, not been much effort in alleviating this difficulty by 
imposing observational constraints on the parameterisation. The limited availabil¬ 
ity of high-quality, long-term datasets of concurring large-scale and convective scale 
observati ons surely contribu tes to this omission. We list recent works in that di¬ 
rection. Neelin et all ( 20081) and Stechmann and Neelin ( 2011 ) used observed re¬ 
lationships between column integrated water vapour and precipitation to inform a 
physics-base d stocha s tic m odel to simulate the onset and duration of very strong 
convection. iHorenkol (120111) developed a framework which allows for a purely data- 


based Markov chain parameteris a tion a llowing for nonstationary data to model cloud 


cover. 


De La Chevrotiere et al. ( 2014J) used data to i nfer the transitio n rate s used 


in the SMCM by employing a Bayesian framework. iDorrestiin et a/.l (120131 ) used 
data from large-eddy simulations to design a data-driven multi-cloud model. The 
transitions between different cloud types are calculate d using Markov chains which 
are conditioned on large-scale variables. More recently IDorrestiin et a/.l (120151) have 
successfully employed that model on observational data obtained in Darwin. 

We complement here the suite of data-driven stochastic models of tropical con¬ 
vection by using observations to build a simple entirely observation-based stochastic 
model. An entirely observation-based model lacks the transparency of physics-based 
models, but is potentially more accurate. We exploit available long-term observa¬ 
tions of the large-scale atmo spheric and the con curring small-scale convective state 


over Darwin and Kwajalein (iDavies et all 120131) . The observations are used to in¬ 


form stochastic models for the convective area fraction (CAF) and the rain rate. We 
present two stochastic models. In the hrst model, CAF (or the rain rate) is treated 
as an uncorrelated random variable conditioned on the large-scale vertical motion 
cusoo- To incorporate non-trivial temporal correlations, we propose a second stochas¬ 
tic model whereby CAF (or the rain rate) are modelled as a Markov chain conditioned 
on cusoo- The stochastic parameterisations can be constructed at either location and 
then be applied to observations of large-scale variables from the respective other loca¬ 
tion. Despite the different atmospheric and oceanic conditions of the two geographical 
locations, the stochastic models reproduce the observed statistics of the convective 
activity such as mean, variance and skewness. 
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The underlying premise of our approach is that the stationary stochastic process 
relating small-scale convective activity and large-scale vertical velocity is sufficiently 
universal in the sense that the stochastic model can be transferred from one geo¬ 
graphical location to another one. Using a Kullback-Leibler information criterion 
for the conditional probabilities of convective activity as well as quantile regression 
for the observational data we establish that for the two regions considered here, it 
is sufficient to correct for the large-scale variables by a simple linear translation to 
account for the respective ambient atmospheric and oceanic regimes at Darwin and 
Kwajalein. It turns out that for mid-level vertical velocities no translation is re¬ 
quired and one can apply the model trained at Kwajalein (Darwin) directly to data 
in Darwin (Kwajalein). 


Alt hough most stochast ic pa rameterisations involve CAPE, we follow iDavies et al. 


( 2013 1. Peters et all (|2013) and Dorrestiin et al ( 2015 1 and relate the observed con¬ 
vective state to cusoo- IDorrestiin et all (l2015l ) hnd that convection is highly correlated 
with column-integrated vertical velocity starting several hours before the onset of 
deep convection. This is not surprising as large-scale vertical motion in the trop¬ 
ics is directly related to deep convection. Conditioning convective s tates on vertical 
motion raises the question of cause-and-effect ambiguities (see e.g. Arakawal. 2004 : 


Peters et al . 20131. for a discussion). On the one hand, convection induces large-scale 


ascending motion through latent heating, which then facilitates further convection. 


On the other hand, pre-existing large-scale ascending motion (or 

convergence) mav 

facilitate the development of convection (Hohenegger and Stevens. 

2013: 

Birch et al. 


201411 which then further increases large-scale ascending motion. We thus argue that 


tropical convection and large-scale ascending motion are intimately linked via a posi¬ 
tive feedback loop, limited by the available energy in the atmospheric column and its 
close environment. We stress that the stochastic parameterisation we propose does 
not rely on nor presume any cause-and-effect relationship between vertical velocities 
and convective activity such as CAP. The models only utilise observed statistical 
relationships such as conditional probabilities and transition probabilities. 

We use CAP (as w ell as rain rate data) t o cha racterise convective activity (cf. 
Dorrestiin et al. ( 2013 1 and Bengtsson et all ( 2013h l. Our motivation to formulate 


the parameterisation with respect to CAP is that it can be used to close convection 
schemes since measures of convective activity such as p recipitation are linearly re¬ 


lated to the area covered by the precipitation feature (ICraigi. Il996l: iNuiiens et al. 


2009 : Yano and Plant . 20121: Davies et al . 2013) 


Purthermore, parameterisations for CAP can be by construction included in the 


framework of resolution i ndepe ndent parameterisations (lArakawa et a/.l.l201ll:lArakawa and Wul. 


20131: IWu and Arakawal. I2014J) . Current mass-flux convection schemes used in oper¬ 


ational GCMs assume the area covered by convective updrafts to be negligible com¬ 
pared to the cloud-free part of a model grid box - the so-called assumption of “scale- 
separation”. This assumption breaks down once the resolution of the GCM becomes 
high enough such that the area covered by convective updrafts can occupy large parts 
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of or even an entire grid box. Parameter isations for CAF are naturally scalable and 
could be used to mitigate this problem (lArakawa and Wul . l2013l: IWu and Arakawal . 
2014fl . Furthermore, most currently employed schemes are mass-flux schemes and 
need to predict the vertical mass flux at cloud base. The mass flux at cloud base 
could be determined by explicitly assigning an area to the convective updraft to¬ 
gether with an updraft velocity. The effect of convection on the environment could 
be implemented by formulating the dependency of the vertical eddy fluxes of ther¬ 
modynamic variables on updraft fraction as dehned by lArakawa and Wd (120131) and 
Wu and Arakawal (120141) or t hrough allowing convecti vely induced subsidence impact 
on neighbouring grid boxes ( Grell and Freitas . 2014J) . Although using CAF allows 
for a certain scale-adaptivity, an increase in resolution would prohibit to identify the 
grid-box state as the large-scale environment. In this case, dehning the large-scale 
envi ronment as the ayerage over a number of surrounding grid-boxes could be used 


e.g. iKeane and Plantl. l2012l) 


The paper is organised as follows. We introduce the observational datasets along 
with a comparison of convective behaviour in Darwin and Kwajalein in Section [2J 
We then use the data to construct the stochastic subgrid-scale convection parameter¬ 
isations in Section |3l A summary of our results and an outlook to future work are 
provided in Section 01 Details on the stochastic convection parameterisations can be 
found in Appendices |S and [Bl 

2 Data 


2.1 Description of the datasets of tropical convection in Kwa¬ 
jalein and Darwin 

We utilise two datasets of observations of the large-scale vertical velocity at 500 hPa 
cjsoo and of the concurring CAFs and rain rates over tropical locations, averaged to 
yield 6-hourly time resolution. The datasets each cover a 190 x 190 km^ pentagon¬ 
shaped area centered over Darwin (Australia) and Kwajalein (Marshall Islands), re¬ 
spectively. The area is chosen as to represent the size of a typical climate model 
grid-box. The Kwajalein site is located in the tropical western Pacihc and is typical 
for a purely tropical oceanic climate. The Darwin site on the other hand is typical 
for the monsoon climate of northern Australia and features the complex topography 
characteristic of a coastal site. 

The a. r ea-me an values of atmospheric variables are derived using the method of 
Xie et al\ (120041) . who employ the variational analysis approach of IZhang and Lin 
(119971) . but use prohles of atmospheric variables from numerical weather prediction 
models instead of atmospheric soundings. Here, the variational analysis employs 
analyses from ECMWF and is constrained by observations of surface precip i tation 
obtained from C-band polarimetric (CPOL) research radars ( Keenan et all 1998 ) 
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and top-of-the-atmosphere radiation at both location s to reliably balance the column 


budgets of mass, heat, moisture and momentum. iDavies et al\ (120131) show that 


constraining the variational analysis by observed rainfall substantially improves the 
derived large-scale vertical velocities over the Darwin domain compared to using just 
the ECMWF analysis alone. 

Over Darwin, the analysis is applied to observational data obtained during three 
consecutive wet seasons (2004/2005, 2005/2006, 2006/2007), yielding a total of 1890 
6-hour means. Over Kwajalein, the analysis is applied to the time period of May 
2008 - Jan 2009, produced t o ht into the framework of th e Year Of Trop i cal C onvec- 
tion virtual held campaign f Waliser and Moncrieh . 20081: Waliser et al . 2012). For 
Kwajalein, 1095 6-hour means are available. At both locations, the large-scale atmo¬ 
spheric data are complemented by data of the concurrent small-scale convective state 
derived from CPOL radar observations. The radar observations were used to derive 
rain area fractions attributable to either stratiform or convective precipitation after 


Steiner et al\ (119951) . The convective area fraction is then determined as the ratio of 
the number of radar pixels classihed as “deep convective” with respect to the total 
number of pixels. Mo r e info rmation regarding the derivation of the datasets can be 
found in iDavies et all (120131) . 

By relying on available 6-hourly averaged data, some characteristics of tropical 
convection, e.g. the diurnal cycle, are ill-resolved. The advantage of the 6-hourly 
averaged data used in this study is that they are self-consistent in the sense that the 
large-scale state is determined via the variational analysis and constrained by the 
radar observations to sa tisfy budgets of ma ss, heat, moisture, and momentum using 
the variational analysis (IDavies et a/.l. l2013l) . We are not aware of observational data 
with higher temporal resolution with the same properties covering a comparable time 
period. 

The data have already provided irnportant new insights into the behaviou r of 
tropical co nvection (Davies et al . 2013 : liters et al 1 , 12 OI 3 I: iKumar et all I 2 OI 3 I) . In 


particular, Peters et al. ( 20131) showed that the relationship between convection and 
a range of large-scale atmospheric forcing conditions is very similar for both regions 
despite their distinctly different atmospheric and oceanic regimes. 


2.2 Analysis of the datasets over Kwajalein and Darwin 

To support our premise that the underlying stochastic process relating the small- 
scale convective activity to the large-scale variables is sufficiently independent of 
the geographical location, we contrast here the observed convection at Darwin and 
Kwajalein. 

Figure [1] shows the 2d histograms of CAF and cusoo of the observations in Darwin 
and Kwajalein as well as the difference between the two distributions. Throughout 
the paper cusoo is given in units of [hPa hour“^]. The plots show strong qualitative sim¬ 
ilarities between the two locations which are suggestive of the existence of a universal 
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Figure 1: Normalised 2d histograms of CAF and (Usoo [hPa/hour] obtained from obser¬ 
vations over Darwin (left) and Kwajalein (middle). The difference of the histograms 
is depicted in the right most plot. 


relationship which can be utilised to construct stochastic subgrid-scale parameterisa- 
tions of CAF conditioned on the large-scale variable cjsoo- Let ns briefly discuss some 
of the particularities of the relationships between CAF and cusoo in Kwajalein and 
Darwin, as seen in Figure [T] The difference between the distributions (right panel 
in Figure [T]) shows that Darwin features more convective activity in the range of 
—5 < (Usoo < 0 than Kwajalein. The converse is true for 0 < cusoo < 5. We attribute 
this difference in convective behaviour to the different prevailing meteorological con¬ 
ditions in Kwajalein and Darwin and the respective different convection initiating 
mechanisms. In particular, land-sea breeze induced convective organisation at Dar¬ 
win (diurnal cycle), and the generally more inhomogeneous surface characteristics 
of the Darwin domain may contribute to different convective responses given a par¬ 
ticular large-scale forcing. For relatively weak large-scale dynamical forcing, i.e. for 
—5 < (Usoo < 5 in our case, land-surface heterogeneities in the Darwin region, such as 
coastlines or spatial differences in land cover, can induce subgrid-scale mesosc ale cir¬ 
culations leading to organised convection (e.g. Pielkel . 2001 : Rieck et ai . 2014 ) which 
then results in increased mean large-scale ascent. The increased convective activity in 
Darwin for negative values of cusoo implies a concurrent decrease for positive values in 
the histograms as seen in Fig. [T] due to the normalisation. It is worth mentioning that 
the observations also include several instances of zero precipitation; 236 events from 
1890 observations in Darwin and 28 from 1095 observations in Kwajalein, and several 
instances of zero CAF; 194 such events in Darwin and 82 in Kwajalein. Note that 
a zero CAF does not imply that there is no precipitation and vice versa. Signihcant 
deep convection is possible for neutral or even mean subsiding conditions as in, for 
example, land-sea breezes in the tropics during mean suppressed conditions. 

Further, Figured] shows that the variance of CAF is dependent on the state (Usoo 
and increase s with decreasing value s of ( u.^nn (not shown). This is consistent with 
the result of Craig and Cohenl ( 2006 ) and Cohen and Craig ( 20061) that the variance 
of convective activity increases with the forcing. Therein the forcing considered was 
a range of radiative cooling rates. However, we remark that, increased radiative 

























Figure 2: Temporal autocorrelation C(r), with r in hours, of the CAF time series for 
Darwin (blue crosses) and Kwajalein (red circles). 


cooling is typically compensated by increased domain mea n mass flux, and the refore 
the vertical velocity cusoo is an effective proxy for forcing. Peters et all ( 2013 1 show 
that the ratio of the standard deviation and the mean of CAF decreases for sufficiently 
negative values of cusoo- This suggests that heavy rain events may be viewed as being 
deterministic (relative to weaker rain events) with an approximate linear dependency 
on cjsoo- This is particularly evident in the Kwajalein data (Figure [H middle panel). 
An analysis of coarse- grained outputs from the ECMWF IFS shows similar results 
for the Darwin region (jWatson et all 120151 ) . 

Figure |2] shows that CAF observed at Kwajalein and Darwin has similar auto¬ 
correlation up to lags of 12 hours. For lags longer than 12 hours, convection over 
Kwajalein looses memory, whereas convection over Darwin exhibits significant auto¬ 
correlation up to lags of 72 hours and features peaks corresponding to the diurnal 
cycle (every 24 hours). 


2.3 Statistical similarity of convective activity 

The comparison of convective behaviour in Darwin and Kwajalein above suggests 
that both locations feature notably different convective behaviour. In this Section we 
will nevertheless establish crucial similarities in the relationship between convective 
activity and large-scale vertical motion which constitute the working hypothesis for 
our stochastic parameterisation schemes. A reader who is just interested in the actual 
stochastic parametrisation may skip this section upon first reading. 

The stochastic subgrid-scale parameterisations proposed in the next Section utilise 
conditional probabilities such as p(CAF(t)|a; 5 oo(t)) describing the probability of con¬ 
vective activity CAF occurring at time t for given vertical velocity cjsoo at that time. 
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We therefore now compare empirical conditional probabilities for the two locations, 
Darwin and Kwajalein, which we denote by PDarwin and PKwajaiein, respectively. To 
construct the conditional probabilities we bin the (cusoo, CAF)-domain into bins of 
size ( 0 . 1 , 0 . 01 ). 

Assuming that the different prevailing atmospheric and oceanic regimes impact 
directly on the large-scale variables, we consider as a hrst approximation a uniform 
translation of the large-scale vertical velocities. In particular, we show that the condi¬ 
tional probability functions poarwin and PKwajaiein are close when the vertical velocities 
of Darwin are shifted as in 


pKwajalein^QAF(f)|a;50o(t)) ~ 

pD“(CAF(t)|u;5oo(t)-A.) (1) 

or analogously 

pD“(CAF(t)|u;5oo(t)) ~ 

pKwajalein(CAF(t) |a;50o(t) + A,,) . (2) 


A standard tool to compare probability density functions P and Q is their Kullback- 
Leibler distance 


I^KL(P||Q)=/l0g 


Q(x) 


P{x) dx . 


(3) 


The Kullback-Leibler distance Dkl{P\\Q) is dehned provided that the support of 
the probability function P is contained in the support of Q] otherwise it is inhnite. 
The Kullback-Leibler di stance is a non-negative qua ntity and it is zero if and only if 
P = Q (see for example iKantz and Schreiberl fjl997l) ). 

We will estimate the Kullback-Leibler distance between the conditional probabilities 
PKwajaiein and pDarwin for each of the (Usoo-bins. In Figure [3] we show the median of 
these Kullback-Leibler distances Dkl as a function of the global shift A^. We have 
discarded those cusoo-bins for which the support of the conditional probability for 
Darwin is not contained in the support of that for Kwaialein to allow for hnite values 
of Dkl. 

A quadratic regression yields an optimal shift of A^ = 0.21 where the minimum 
of the Kullback-Leibler distance is attained. The shift A^^ is given in units of [hPa 
hour“^]. In general, the Kullback-Leibler distance is asymmetric with Dkl(D||Q) 7 ^ 
Dkl{P\\Q)- We hnd, however, that DKL(PKwajaiein|IPoarwin) has a minimum very close 
to same value of A^^ supporting our approximation that the two conditional proba¬ 
bility functions are related by a simple translation of the vertical velocities. We note, 
that due to the larger amount of available observations for Darwin {N = 1890) when 
compared to Kwajalein {N = 1095) and due to the larger support of PKwajaiein fhe 
formulation (jS]) is preferred. 
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The similaritiy of the convective behaviour at both locations can be further ex- 
amined by performing a med i an (or 50*^-quan tiIe) regression for CAF (see for example 
Koeneker and Bassett f l978 h GrinstedI ( 20081) and BremneJ (2004); Friederichs and Reuse 


(120081) : iMudelseel (120101) for applications of quantile regression methods in the at¬ 
mospheric sciences). We determine the conditional median for the observations of 
Kwajalein and Darwin using a second-order regression. Using conditional medians 
rather than conditional means (as done in normal least square regression) produces 
more robust estimates by eliminating the impact of the few very large rain events 
and other statistical outliers. The median regressions for Kwajalein and for Darwin 
approximately coincide if one translates the cusoo values of Kwajalein by = 0.2 (or 
those of Darwin by — = —0.2, respectively), as seen in Figure 01 corroborating the 

hnding of the Kullback-Leibler analysis. 


We remark that the shift A^^ depends on the height at which oj is evaluated. We 
also analysed observations of the vertical velocity taken at 715 hPa; there the opti¬ 
mal shift for which the respective quantile regressions were closest and for which the 
Kullback-Leibler distance was minimal is found to be A^^ 1.67. We attribute this 

uniform shift of the large-scale vertical velocity to the different prevailing atmospheric- 
oceanic regimes at the two respective locations as discussed in Section 12.21 Specifi¬ 
cally, land surface effects are expected to exert a stronger influence on atmospheric 
variables in the lower (715 hPa) than in the middle (500 hPa) troposphere. 

It is by no means clear that the same (possibly non-zero) shift can be applied to 
all locations in the tropics. The particular value of Ao; found from the data in Dar¬ 
win and Kwajalein might be different when considering other geographical locations. 
Furthermore, it is also not clear that a similarity of the conditional probability func¬ 
tions exists at all when shifting the vertical velocity for other geographical locations. 
This would have to be checked when more data from other locations become avail¬ 
able. If true, such a universality would mean that no costly geographically dependent 
hne-tuning would be required in estimating the shift A^ for different geographical 
locations. 


The estimated shift A^^ = 0.2 hPa hour“^, we found here for (Usoo, is small com¬ 
pared to the range of cusoo and we therefore ignore the shift when comparing data 
from Kwajalein with Darwin (and vice versa), unless stated otherwise (note that 
non-trivial shifts have to be applied in constructing models conditioned on, let’s say. 
To ensure sufficient generality, however, we will present in the following Section 
the method for possible non-trivial shifts A^ ^ 0. 
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Figure 3: Kullback-Leibler distance between the conditional probability functions 
Poarwin and PKwajaiein as a function the shift (circles). The minimum of the 
quadratic least square approximation (solid curve) is at = 0.21. 



Figure 4: CAF as a function of the vertical velocities cusoo [hPa/hour] obtained from 
observations over Kwajalein (black crosses). The continuous line connecting the cir¬ 
cles (online blue) shows the results of a 2“'^order median regression. The continuous 
line connecting the diamonds (online red) shows the result of a 2’^^order median re¬ 
gression for the Darwin data plotted against cusoo — 0.2. 
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3 Stochastic subgridscale parameterisation 

We will develop two stochastic subgrid-scale parameterisation schemes for CAF condi¬ 
tioned on (Usoo; one in which subgrid-scale convection variables such as CAF are viewed 
as instantaneous random variables conditioned on the current value of the large-scale 
vertical velocity cjsoo, and a second approach in which the subgrid-scale variables are 
viewed as a conditional Markov chain taking into account non-vanishing temporal cor¬ 
relations of the subgrid-scale variables (cf. Figure [2]). The parametrisation schemes 
we propose model tropical convection at any location given only the information of 
the large-scale values of cusoo at a given time without any usage of the small-scale 
convection variables such as CAF at that time. 

We are given time series consisting of 6-hourly averaged observations of cusoo and 
of CAF obtained at Kwajalein and Darwin, which we denote by {ti; 500 fe}fc=i,---,v and 
{yk\k=i,- ,N with N = 1890 for Darwin and N = 1095 for Kwajalein, respectively (cf. 
Section H]). The statistical similarity of convective activity established in Section 1^751 
suggests that we can generate the stochastic model from observations of either loca¬ 
tion and apply it to the other location, respectively, without applying a linear shift 

to the vertical velocities cusoo- We describe the methods for the situation when ob¬ 
servations obtained in Darwin are used to train the model which is then subsequently 
applied to observations of cjsoo in Kwajalein, but we will present results as well for 
the reversed case. 

3.1 Instantaneous conditional random variables 

In our hrst stochastic model convective activity is treated as sequence of independent 
random variables conditioned on the current value of the vertical velocity (Usoo- The 
parameterisation has two components: a training component and an application com¬ 
ponent. The training component is performed as follows. Given pairs of observations 
for the vertical velocity (Usoo and CAF (or the rain rate), we want to associate with 
each value of cusoo a range of possible convective events and determine their respective 
probabilities of occurrence. We do so by partitioning the (cusoo, CAF)-plane into bins. 
This will dehne coarse-grained values Cj for u. For each of the coarse-grained values 
Cj we can now associated coarse-grained values CAF by averaging CAF over each bin 
associated with the coarse-grained value Cj and estimate their respective conditional 
probabilities P(CAF|a)) empirically recording the frequencies of CAF in their respec¬ 
tive bins. The interested reader is referred to the Appendix for more details on the 
model. 

We construct the stochastic model with observations from Darwin. We partition 
the (cusoo, 2 /)-plane into bins of size (0.8,0.005). Choosing the bin size is a balanc¬ 
ing act between requiring sufficiently small bin sizes to assure accuracy and needing 
sufficiently large bin sizes to allow for meaningful statistical averages within a bin. 
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Choosing the bins requires tuning and is dependent on the number of observations 
available. We have tested that doubling the bin sizes still produces good results. 


Since we do not have sufficient data to construct the stochastic model for large neg¬ 
ative values of wsoo, we use a det erministic relations hip between CAF and wsoo for ob¬ 
servations with cjsoo < —18 (cf. IPeters et all (120131) h The deterministic relationship 
is found by linear regression of the observations to be CAF = —0.0044 wsoo — 0.011. 


To test the effectiveness of our model we now apply the Darwin-trained model to 
observations in Kwajalein and generate synthetic time series of CAF conditioned on 
the large-scale Wsoo observed over Kwajalein. In Figure Owe show the time series of 
the observations of CAF in Kwajalein (top panel) and the corresponding synthetic 
time series of the stochastic model using conditional random variables (middle panel). 
The model reproduces observed intermittent features of tropical convection. However, 
it fails to reproduce periods of sustained non-convection near, for example, t 200 
and t ~ 900. This failure is due to our approach not incorporating any memory or 
trends, despite non vanishing autocorrelations as seen in Figure [2J 

To establish a more quantitative comparison, we compare in Figure E] the empir¬ 
ically determined probability density functions of CAF for the synthetic time series 
and the actual observations. By performing averages over 1, 000 realisations of the 
stochastic model we have established that the hrst three moments of CAF in Kwa¬ 
jalein, the mean /x, the variance cr^ and the skewness are well captured by our 
synthetic time series. This is illustrated in Table [U 


Table 1: First three moments mean p, variance and skewness ^ of observed CAF for 
Kwajalein and of the synthetic data obtained by the subgrid-scale parameterisations 
conditioned on oxsoo for the two models trained with observations from Darwin. 



h 


e 

observations 

0.0066 

1.89 10“^ 

4.27 

random variable 

0.0073 

1.8010-^ 

4.29 

Markov chain 

0.0066 

2.75 10-^ 

4.25 


The numerical results presented above used a stochastic model which was gener¬ 
ated using the observations at Darwin and then subsequently applied to observations 
of large-scale vertical velocities observed at Kwajalein to produce the associated con¬ 
vective activity at Kwajalein. In accordance with the statistical similarity of con¬ 
vective activity established in Section 12.31 we have also trained the stochastic model 
on the data observed at Kwajalein and applied them to observations of large-scale 
vertical velocities observed at Darwin with equal success. The results for the hrst 
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Table 2: First three moments mean /r, variance and skewness ^ of observed CAF 
for Darwin and of the synthetic data obtained by the subgrid-scale parameterisation 
conditioned on (Usoo for the two models trained with observations from Kwajalein. 



/i 


e 

observations 

0.0080 

1.29 10“^ 

2.38 

random variable 

0.0075 

1.4510-^ 

2.46 

Markov chain 

0.0083 

2.38 10-^ 

2.46 


three moments are shown in Table [2] for completeness. 

We remark that conditioning the observations on the large-scale variables produces 
better estimates of the moments than simply taking the observations. For example, 
the actually observed mean of convective activity in Kwajalein /i = 0.0066 is estimated 
as 0.0073 using instantaneous random variables conditioned on cusoo (cf. Table [I]) 
whereas if just estimated by the mean of the training set (i.e. the observations of 
CAF in Darwin) the estimate of the mean of convective activity would be 0.008 (cf. 
Table |2]) . 

We obtained similarly good results when parameterising CAF conditioned on ob¬ 
servations of the vertical velocity at 715 hPa (not shown); in this case the vertical 
velocities were shifted by = 1.67 (cf. Section |2A]). 

Further, we have constructed synthetic time series of rain rate data consisting of 
random variables conditioned on the vertical velocity and found similarly good results 
(not shown). 


3.2 Conditional Markov chain 


The observational data obtained in Kwajalein and Darwin exhibit non-vanishing tem¬ 
poral autocorrelations as illustrated in Figure [2l This suggests that a more appro¬ 
priate parameterisation of CAF should incorporate dependencies on previous ob¬ 
servations rather than simply conditioning on the present values of the large-scale 
variables. The autocorrelation of CAF and of cusoo as well as of the crosscorrela¬ 
tion function for a lag of one time step (6 hours) exhibit similar values in Kwa¬ 
jalein and in Darwin (with the autocorrelation function for cjsoo exhibiting a much 
stronger diurnal cycle), but differ substantially for lags greater than 12 hours. This 
suggests a Markov model trained at one location should adequately capture the con¬ 
vective behaviour at the other location if conditioned on only the observations of 
the previous time step 6 hours ago. As a hrst step towards incorporating memory 
one may construct a Markov chain conditioned on t he pre vious state of the system 
(see, for example, ICrommelin and Vanden-Ei in denI (20081) or by htting an AR(1) 
process about an (Usoo-dependent mean as in IWilksI (120051) . We follow here the ap- 
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proach proposed bv ICrommelin and Vanden-EiindenI (120081) for a conditional Markov 
chain. The conditional Markov chain estimates the conditional transition probability 
F(CAFfc|a;fc, Cjk-i, CAFfc_i), where k denotes the present time and k—1 the time of the 
previous observation. The conditioning on the previous time step takes into account 
trends in the dynamics of the vertical velocity and accounts for non-trivial temporal 
correlations. We first estimate the (unconditional) transition probability from obser¬ 
vations at Darwin. This is achieved again by partitioning the (wsoo, CAF)-plane into 
bins and counting frequencies of transitions between bins within one sampling time. 
The aim is now to use this transition probability to draw random realisations CAF^ 
from this Markov chain for observations {Cjk-i-, CAFfc_i) in Kwajalein conditioned on 
the current observation of the large-scale velocity Uk- We refer the interested reader 
to the Appendix |B] for more details on practical aspects. 

The data sparse region of large convective activity for wsoo < —18 is again treated 
with a deterministic relationship as in the instantaneous random variable model de¬ 
scribed in Section [3Tl We subdivide the (cusoo, CAF)-plane again into bins of size 
(0.8,0.005). 

In Figure [5] (bottom panel) we show a time series of the observations of CAF in 
Kwajalein and the corresponding data obtained from the conditional Markov chains 
which was trained with observations obtained in Darwin. Due to insufficient amount 
of data not all transitions could be captured leading to a shorter synthetic time 
series. Only approximately 3/4 of the data points in Kwajalein can be reached by 
the Markov chain and only approximately 60% of those form a time-continuous set 
of at least 12 hours. Hence the plot of the time series in Figu re [5] suffers from missing 
data points along the given time interval. We mention that iDorrestiin et al\ (1201511 
employed a Markov chain model for the data obtained in Darwin mitigating the 
problem of data sparseness by i), coarse-graining the convective state into different 
cloud types at the scale of individual radar pixels, rather than using CAF directly, 
and ii) using precipitation area fraction data at very high temporal resolution (10 
minutes) in combination with a linearly interpolated version of the 6-hourly large 
scale atmospheric state. We chose not to employ such a linearly interpolated version 
of the large-scale data as this eliminates the self-consistency of the dataset. 

The empirical probability density functions of CAF are shown in Figure [7] with 
reasonable correspondence. Results for the hrst three moments of CAF are listed in 
Tables [T] and [2l Again, the statistics of the actual observations is reasonably well 
reproduced. The variance is overestimated by the Markov chain. This may be due 
to the averaging of CAF within the relatively coarse bins (cf. the definition of the 
coarse-grained CAF values dl]) which is also used in the Markov chain). 
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4 Summary and Conclusions 

In this study, we used observations of tropical deep convection and the concurring 
large-scale atmospheric states at two tropical locations, Darwin and Kwajalein, to 
design a data-driven stochastic subgrid-scale parameterisation for tropical deep con¬ 
vection. The parameterisation we propose can be built off-line and then subsequently 
implemented at low computational cost. The schemes we proposed assume that con¬ 
vective activity has been triggered. 

Given large-scale variables such as vertical velocity, as provided by the dynamical 
core of the host model, our stochastic models can be coupled to an already existing 
convection scheme, which is part of the model physics. The important and hard 
problem of triggering convection is performed by the host models’ convection scheme. 
Once convection is triggered, we see the contribution of our stochastic models as pro¬ 
viding the host models’ convection scheme with statistically consistent estimates for 
the cloud-base mass flux. Properly estimating the cloud base mass flux is paramount 
to determine the overall strength of convection. This can be done in a scaleable way 
using CAP to determine the convective cloud base mass flux. The convective cloud 
base mass flux can be estimated as the product of CAF, the air density and the 
upward velocity at cloud base which may be either assumed constant, e.g. 1 ms“^, 
or may be estimated from boundary layer characteristics. The upward velocity at 
cloud base would be assigned at the beginning of the updraught calculation in the 
convection scheme, with CAF providing the link to the large-scale environment. 

We presented two diagnostic approaches to stochastically parameterise convective 
activity conditioned on large-scale vertical velocity. The first method treated CAF 
as an instantaneous random variable conditioned on the current value of cusoo- This 
method suffers from neglecting non-vanishing autocorrelations present in the observa¬ 
tions and is not able to reproduce periods of sustained convection and non-convection, 
for example. The second approach was built around a conditional Markov chain and 
incorporates autocorrelations to some degree; this method, however, requires sub¬ 
stantially more data to train the Markov chain as it involves conditioning on the past 
observations as well as on the current value of cusoo- Given these limitations, our 
results are promising. The marginal probability functions of CAF as well as its first 
three moments were reasonably well reproduced by both approaches, except for the 
variance which was overestimated by the Markov chain. This is particularly remark¬ 
able as the stochastic models were trained with data from one geographical location 
and then applied to another geographical location with different atmospheric and 
oceanic conditions. In general, we would expect the conditional Markov chain to pro¬ 
vide better diagnostics than the parameterisation consisting of instantaneous random 
variables as it accounts for memory effects. In particular we expect the conditional 
Markov chain to reproduce the autocorrelations of CAF for time lags less than 6 
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hours. The Markov chain generated by our observational data sets, however, did not 
produce long enough artificial time series of CAF which would allow for a reliable esti¬ 
mation of the autocorrelation function. To further test the proposed parameterisation 
schemes for CAF we will in future work i) use numerical data from high-resolution 
cloud resolving models (or larger observational data sets if they become available) 
and ii) implement the proposed stochastic models as part of operational convection 
parameterisations in comprehensive GCMs. 

We have used quantile regression and the Kullback-Leibler test to probe for univer¬ 
sality of the relationship between convective activity and large-scale vertical motion 
at 500 hPa, cusoo [hPa/hour], allowing for a simple global shift of the vertical veloci¬ 
ties. Despite markedly different prevalent atmospheric and oceanic regimes at Darwin 
and Kwajalein the joint probability density functions were close and did not require a 
shift. This implied that the stochastic models can be trained at one geographical lo¬ 
cation and then be subsequently applied to the respective other location. For vertical 
velocities evaluated at 715 hPa the joint probability density functions were closest, 
however, when a simple shift in the vertical velocity was performed. To more accu¬ 
rately calibrate the required shifts in the vertical velocities and to take into account 
the respective atmospheric environments of different geographical locations, numeri¬ 
cal data from high-resolution cloud resolving models could be used as a surrogate for 
missing observational data in future research. 


We chose to parameterise mainly subgrid-scale CAF because i) it is directly related 
to domain mean rainfall and thus total latent heating and ii) assigning a non-zero 
area fraction to convective updrafts in a convection scheme relieves the problems as¬ 
sociated with the assumption of “sc ale-separation” as employed in current convection 
schemes (e.g. Arakawa et all 2011 ). As described above, our stochastic models could 
be efficiently applied to estimate statistically consistent estimates of cloud base mass 
flux, essentially providing the closure for mass-flux convection schemes. Such a con¬ 
vective scheme would be fully scalable with convective updrafts eventually covering 
large portions of or even entire grid-boxes. In fact, ongoing work by one of the au¬ 
thors (KP) shows that such an implementation yields plausible results in a full GCM. 
Although CAF is suited for a resolution independent comprehensive parameterisa¬ 
tion of deep convection, the way the observational data have been obtained involves 
a particular spatial scale (i.e. the 190 x 190 km^ pentagon-shaped area considered 
here). The observations would have to be adapted for the particular r esolution of the 
GCM. In that context, using the same data as in the present study, Tikotinl ( 2012 ) 
sub-divided the Kwajalein domain into sub-domains of different size and analysed the 
relationship between convective activity and cjsoo as a function of the domain size. 
While the overall statistical relationships remain identical with decreasing domain 
size, the variability of convection given a particular large-scale state increased with 
decreasing domain size. 
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We have developed here stochastic parameterisation schemes for convective activ¬ 
ity which are data-driven. Their attractiveness lies in their simplicity and their ease 
of implementation. They can be a useful tool in times when the physics is not suffi¬ 
ciently well understood and/or resolved by physics-based parameterisation schemes. 
However, we would like to end with a word of caution for data-driven parameteri¬ 
sation schemes in climate models. The models are trained under the assumption of 
statistical equilibrium. It is not clear whether the change of global climatic conditions 
will leave the statistical relationships between CAT and CJ 500 constant. These issues 
would not apply to parametrisation in numerical weather prediction models. 


In climate or numerical weather prediction models the resolved variables includ¬ 
ing the vertical velocities are updated in time using the convective state, e.g. ver¬ 
tically resolved heating rates. To be able to test whether our data-based stochastic 
parametrisation for the convective state can be successfully used requires several tests 
planned for further research. Our premise is that, given a judicious choice of large- 
scale variable, convective activity can be parametrised in terms of just these variables. 
In this work we chose the vertical velocity at 500 hPa as our large-scale variable. The 
stochastic models we proposed here are only practically viable if the number of those 
iudicious variab l es is s ufficiently small. This is similar to the approach taken by 
Dorrestiin et al. ( 20151 ) who conditioned a data-driven stochastic multi-cloud model 


on large-scale vertical velocity only and were able to adequately simulate observed 
convective area fractions. Of course, the strength of atmospheric moist convection 
also depends on numerous other variables such as the buoyancy of surface air parcels 
and humidity of the mid-troposphere. It is a priori not clear whether condition¬ 
ing on just one variable is sufficient. Indeed, one could imagine that by neglecting 
the conditioning of the convective state on more variables than just the large-scale 
vertical velocity, the error in the stochastic parametrisation for the convection will 
eventually be accrued in all large-scale variables during the numerical integration. 
This may lead to a detrimental accumulation of errors in a positive feedback loop. 
It is planned to test in high-resolution cloud resolving models whether introducing 
more than on large-scale variable for the conditioning will be benehcial. In particu¬ 
lar, low-to mid level moisture might be important as it is known to play a major role 
in, for example , in th e initiation of the Madden -Julian Oscillation; see for example, 
Khouider et al\ ( 2013 ): Aiavamohan et al. ( 2013 ) and references therein. In case more 


resolved large-scale variables are needed to condition the stochastic parametrisations 
of convective activity, one could use a linear combination of these variables to allow 
for a computationally feasible parametrisation. 
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A Description of the stochastic model using in¬ 
stantaneous random variables 


Let us denote by y the subgrid-scale variable, for example CAF or the rain rate. We 
partition the range of u^oo into intervals with i = 1, ■■■ , and the range of 
the subgrid-scale variables into Ny intervals with n = 1, • • • , Ny. This partitions 
the (wsoo, 2 /)-plane into N^^Ny bins. We assume that the time series {ti; 500 fe}fc=i,---,v 
and {yk}k=i,--- ,N stem from a stationary process. Coarse-grained CAF values (denoted 
by CAF in Section lTT|) . conditioned on the large-scale variables wsoo G (denoted 
by u in Section [3T]) . are determined as averages over bins with 


-(ni) _ 'E^kHk^yk e ly] i[^500fe e il] 

y - • ( 4 ) 

where Ny"’’’’^ = l[yk € /”] l[tJ 500 s; ^ number of 7/^-values belonging to 

the bin dehned as the intersection of the intervals and Here ![•] denotes the 
indicator function with l[yi. G /^ ] = 1 if yk G ly and l[yk G /” ] = 0 otherwise. The 
conditional probability P{n\i) of CAF y^ being in the interval conditioned on oj^oOk 

being in the interval (denoted by P(CAF|a;) in Section I3T]) is calculated as 


P{n\i) 


^[Vk 


G /”' 
y- 


l[l^500i, e PJ 


Ni 


(5) 


where A® = l[i^ 500 fe ^ is the number of realisations of yk for a given value of 

the large-scale wsoofe G Ih- Note that ~ 1- Estimating and P{n\i) 

concludes the training period. 

To generate artificial time series of the subgrid-scale variable y conditioned on 
Wsoo observed at a different geographical location, one simply assigns with probability 
P{n\i) the coarse grained value 


B Description of the stochastic model using a con¬ 
ditional Markov chain 

To construct the Markov chain we determine a transition probability P^’i^ which 
denotes the probability for the variables {oJ^oOk^Vk) to take values in the bin dehned 
as the intersection of the intervals Jt and at time step k when they were in the bin 
dehned as the intersection of the intervals and /” at the previous time step k — 1. 
To construct as a matrix we arrange the bins into one long array. The associated 
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N^^Ny X N^Ny transition matrix describing transitions from bin a = i + {n — l)Ni^ 
to bin /3 = j + {m — l)Ni^ is then estimated from the observational data as 




rpP 

__ 

s;^NuiNy rpP ’ 

Z^/3=l 


( 6 ) 


where connts the nnmber of transitions from the bin labelled with a to the bin 
labelled with /3 and is given by 


1 [i^500fe_i e PJ l[yk-i e /”] 

k 

X l[^500fc e li] l[l/fc £ • 

Estimating the transition matrix P^ conclndes the training phase. To constrnct 
a Markov chain conditioned on wsoo taking a particnlar valne at present time step A;, 
we apply the transition matrix to the given past state a* at time fc — 1 to calcnlate 
Trf* = (0, • • • , 1, • • • , 0)Pf where the 1 is in the a*-th entry. Then we select those 
L < Ny bins, i.e. the non-zero coordinates of tt^*, which are consistent with the 
cnrrent valne Wsoo*,- These L entries of Trfi with I = 1, • • • ,L, associated with the 
cnrrent valne of cjsoo, (if they exist!), do not necessarily snm np to 1 as reqnired for 
a probability. Hence we renormalise as follows 


TT' 





01 

a* 


(7) 


The snbgrid-scale variable Uk is then randomly chosen from L possible states with 
probability frfi. The assigned valnes corresponding to the bin labelled with I3i are 
coarse-grained valnes obtained by averaging over the bins analogonsly to (0]). 
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Figure 5: Time series of CAF of the observations over Kwajalein (top), of the synthetic 
process conditioned on the vertical velocities u^oo described in Section 13.11 (middle) 
and of the conditional Markov process process described in Sectioning] (bottom). The 
time series generated via the conditional Markov chain has missing data points in the 
depicted time interval (see text for details). The plots have a time resolution of 6 
hours. Here f = 0 corresponds to 1 May 2008 00 UTC. 
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Figure 6: Empirical histogram of CAF for the observations over Kwajalein (crosses, 
online blue) and for the synthetic process conditioned on the vertical velocities cusoo 
described in Section |3lT] (circles, online magenta). 



Figure 7: Empirical probability density function of CAF for the observations over 
Kwajalein (crosses, online blue) and for the conditional Markov chain model (circles, 
online magenta). 
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